Soccer captioning: dataset, transformer-based model, and triple-level evaluation
نویسندگان
چکیده
This work aims at generating captions for soccer videos using deep learning. The paper introduces a novel dataset, model, and triple-level evaluation. dataset consists of 22k caption-clip pairs three visual features (images, optical flow, inpainting) 500 hours SoccerNet videos. model is divided into parts: transformer learns language, ConvNets learn vision, fusion linguistic generates captions. suggested evaluation criterion captioning models covers levels: syntax (the commonly used metrics such as BLEU-score CIDEr), semantics quality descriptions domain expert), corpus diversity generated captions). shows that the has improved (from 0.07 reaching 0.18) with semantics-related losses prioritize selected words. Semantics-related utilization more (optical normalized score by 27%.
منابع مشابه
teacher educator evaluation model
اگرکیفیت معلم کلاس برای بهبودیادگیری دانش آموزحیاتی است،پس کیفیت اساتیددانشجو-معلمان، یابه عبارتی معلمین معلمان نیزبرای پیشرفت آموزش بسیارمهم واساسی است.ناگفته پیداست که یک سیستم مناسب آموزش معلمان ،معلمین با کیفیتی را تربیت خواهدکرد.که این کار منجربه داشتن مدارس خوب، ودرنتیجه نیروی کارماهرتروشهروندبهتربرای جامعه خواهدشد. اساتیددانشجو-معلمان نقشی بسیارمهم را در سیستم اموزش معلمان درسراسرجهان ای...
End-to-End Dense Video Captioning with Masked Transformer
Dense video captioning aims to generate text descriptions for all events in an untrimmed video. This involves both detecting and describing events. Therefore, all previous methods on dense video captioning tackle this problem by building two models, i.e. an event proposal and a captioning model, for these two sub-problems. The models are either trained separately or in alternation. This prevent...
متن کاملImage Captioning with Sentiment Terms via Weakly-Supervised Sentiment Dataset
Image captioning task has become a highly competitive research area with application of convolutional and recurrent neural networks, especially with the advent of long short-term memory (LSTM) architecture. However, its primary focus has been a factual description of the images, mostly objects and their actions. While such focus has demonstrated competence, describing the images with non-factua...
متن کاملEvaluation of Model based Tracking with TrakMark Dataset
We benchmark two tracking methods developed in the INRIA Lagadic team with a TrakMark dataset. Since these methods are based on a 3D model based approach, we selected a dataset named “Conference Venue Package 01” that includes a 3D textured model of a scene. For the evaluation. we compute the error of 3D rotation and translation with the ground truth transformation matrix. Through these evaluat...
متن کاملMAM-RNN: Multi-level Attention Model Based RNN for Video Captioning
Visual information is quite important for the task of video captioning. However, in the video, there are a lot of uncorrelated content, which may cause interference to generate a correct caption. Based on this point, we attempt to exploit the visual features which are most correlated to the caption. In this paper, a Multi-level Attention Model based Recurrent Neural Network (MAM-RNN) is propose...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Procedia Computer Science
سال: 2022
ISSN: ['1877-0509']
DOI: https://doi.org/10.1016/j.procs.2022.10.125